科学计算应用从高性能计算基础设施(如超级计算机)受益匪浅。但是,我们在这些应用程序的计算结构,设计和要求中看到了范式转变。越来越多地,数据驱动和机器学习方法正在用于支持,加速和增强科学计算应用,尤其是分子动力学模拟。同时,云计算平台越来越多地吸引科学计算,提供“无限”计算功率,更容易编程和部署模型,以及访问计算加速器,例如TPU(张量处理单元)。这种机器学习(ML)和云计算的这种汇合代表了云和系统研究人员的令人兴奋的机会。 ML辅助分子动力学模拟是一类新的工作量,并且具有独特的计算模式。这些模拟为低成本和高性能执行提供了新的挑战。我们认为,瞬态云资源,如低成本的抢占云VM,可以是这款新工作负载的可行平台。最后,我们在云资源管理中展示了一些低悬垂的水果和长期挑战,以及分子动力学模拟将分子动力学模拟的闪烁平台(如纹身流程)集成。
translated by 谷歌翻译
经典分子动力学模拟基于求解牛顿运动方程。使用小型时间的数字集成商,例如法语,如法术生成粒子的轨迹作为牛顿方程的解决方案。我们介绍了使用经常性神经网络衍生的运算符,可使用过去轨迹数据的序列来准确地解决牛顿方程,并使用比法术时间更大的时间最高4000倍的粒子的节能动态。我们在许多示例问题中展示了显着的加速,包括高达16个粒子的3D系统。
translated by 谷歌翻译
Computer tomography (CT) have been routinely used for the diagnosis of lung diseases and recently, during the pandemic, for detecting the infectivity and severity of COVID-19 disease. One of the major concerns in using ma-chine learning (ML) approaches for automatic processing of CT scan images in clinical setting is that these methods are trained on limited and biased sub-sets of publicly available COVID-19 data. This has raised concerns regarding the generalizability of these models on external datasets, not seen by the model during training. To address some of these issues, in this work CT scan images from confirmed COVID-19 data obtained from one of the largest public repositories, COVIDx CT 2A were used for training and internal vali-dation of machine learning models. For the external validation we generated Indian-COVID-19 CT dataset, an open-source repository containing 3D CT volumes and 12096 chest CT images from 288 COVID-19 patients from In-dia. Comparative performance evaluation of four state-of-the-art machine learning models, viz., a lightweight convolutional neural network (CNN), and three other CNN based deep learning (DL) models such as VGG-16, ResNet-50 and Inception-v3 in classifying CT images into three classes, viz., normal, non-covid pneumonia, and COVID-19 is carried out on these two datasets. Our analysis showed that the performance of all the models is comparable on the hold-out COVIDx CT 2A test set with 90% - 99% accuracies (96% for CNN), while on the external Indian-COVID-19 CT dataset a drop in the performance is observed for all the models (8% - 19%). The traditional ma-chine learning model, CNN performed the best on the external dataset (accu-racy 88%) in comparison to the deep learning models, indicating that a light-weight CNN is better generalizable on unseen data. The data and code are made available at https://github.com/aleesuss/c19.
translated by 谷歌翻译
Neoplasms (NPs) and neurological diseases and disorders (NDDs) are amongst the major classes of diseases underlying deaths of a disproportionate number of people worldwide. To determine if there exist some distinctive features in the local wiring patterns of protein interactions emerging at the onset of a disease belonging to either of these two classes, we examined 112 and 175 protein interaction networks belonging to NPs and NDDs, respectively. Orbit usage profiles (OUPs) for each of these networks were enumerated by investigating the networks' local topology. 56 non-redundant OUPs (nrOUPs) were derived and used as network features for classification between these two disease classes. Four machine learning classifiers, namely, k-nearest neighbour (KNN), support vector machine (SVM), deep neural network (DNN), random forest (RF) were trained on these data. DNN obtained the greatest average AUPRC (0.988) among these classifiers. DNNs developed on node2vec and the proposed nrOUPs embeddings were compared using 5-fold cross validation on the basis of average values of the six of performance measures, viz., AUPRC, Accuracy, Sensitivity, Specificity, Precision and MCC. It was found that nrOUPs based classifier performed better in all of these six performance measures.
translated by 谷歌翻译
We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant $t$ such that the overall release is user-level $\varepsilon$-DP and has the following error guarantee: Denoting by $M_t$ the maximum number of samples contributed by a user, as long as $\tilde{\Omega}(1/\varepsilon)$ users have $M_t/2$ samples each, the error at time $t$ is $\tilde{O}(1/\sqrt{t}+\sqrt{M}_t/t\varepsilon)$. This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples.
translated by 谷歌翻译
Creating high-performance generalizable deep neural networks for phytoplankton monitoring requires utilizing large-scale data coming from diverse global water sources. A major challenge to training such networks lies in data privacy, where data collected at different facilities are often restricted from being transferred to a centralized location. A promising approach to overcome this challenge is federated learning, where training is done at site level on local data, and only the model parameters are exchanged over the network to generate a global model. In this study, we explore the feasibility of leveraging federated learning for privacy-preserving training of deep neural networks for phytoplankton classification. More specifically, we simulate two different federated learning frameworks, federated learning (FL) and mutually exclusive FL (ME-FL), and compare their performance to a traditional centralized learning (CL) framework. Experimental results from this study demonstrate the feasibility and potential of federated learning for phytoplankton monitoring.
translated by 谷歌翻译
This paper considers adaptive radar electronic counter-counter measures (ECCM) to mitigate ECM by an adversarial jammer. Our ECCM approach models the jammer-radar interaction as a Principal Agent Problem (PAP), a popular economics framework for interaction between two entities with an information imbalance. In our setup, the radar does not know the jammer's utility. Instead, the radar learns the jammer's utility adaptively over time using inverse reinforcement learning. The radar's adaptive ECCM objective is two-fold (1) maximize its utility by solving the PAP, and (2) estimate the jammer's utility by observing its response. Our adaptive ECCM scheme uses deep ideas from revealed preference in micro-economics and principal agent problem in contract theory. Our numerical results show that, over time, our adaptive ECCM both identifies and mitigates the jammer's utility.
translated by 谷歌翻译
因果和归因研究对于地球科学发现至关重要,对于为气候,生态和水政策提供信息至关重要。但是,当前的方法需要与科学和利益相关者挑战的复杂性以及数据可用性以及数据驱动方法的充分性相结合。除非通过物理学进行仔细的通知,否则它们会冒着将相关性与因果关系相关或因估计不准确而淹没的风险。鉴于自然实验,对照试验,干预措施和反事实检查通常是不切实际的,因此已经开发了信息理论方法,并在地球科学中不断完善。在这里,我们表明,基于转移熵的因果图最近在具有备受瞩目的发现的地球科学中变得流行,即使增强具有统计学意义,也可能是虚假的。我们开发了一种基于子样本的合奏方法,用于鲁棒性因果分析。模拟数据以及气候和生态水文中的观察表明,这种方法的鲁棒性和一致性。
translated by 谷歌翻译
掌舵学习(帮助每个人学习更多)是第一个在线对等学习平台,它允许学生(通常是中高中生)教授课程和学生(通常是基本学的学生),可以向免费上课。这种班级结构方法(对等学习)已被证明可对学习有效,因为它可以促进团队合作和协作,并可以积极学习。 Helm是一个独特的平台,因为它为学生提供了一个简单的过程,可以在结构化的,点对点的环境中创建,教学和学习主题。自2020年4月创建赫尔姆以来,它在世界四大洲的4000多名老师和80名老师中获得了超过4000多名。 Helm从一个简单的网站和Google形式平台发展到与Python,SQL,JavaScript和HTML一起编码的后端系统,该系统托管在AWS服务中。这不仅使学生更容易注册(因为学生的信息保存在SQL数据库中,这意味着他们可以注册课程而无需再次提供信息,并获得有关课程的自动电子邮件) ,但也使教师更容易教书(作为补充过程,例如创建缩放链接,类记录文件夹,向学生发送电子邮件等)。此外,Helm还有一个建议机器学习算法,该算法建议学生根据学生参加的以前的课程而喜欢参加课程和学科。这为学生提供了更轻松的体验,可以注册他们感兴趣的课程。
translated by 谷歌翻译
本文提出了一种新型的非侵入系统故障预测技术,使用来自开发人员的可用信息,以及来自原始日志中的最小信息(而不是挖掘整个日志),但与数据所有者完全保持数据。基于神经网络的多级分类器是为故障预测而开发的,使用人为生成的匿名数据集,应用技术组合,即遗传算法(步骤),模式重复等,以训练和测试网络。提出的机制完全将用于培训过程的数据集与保留私有数据的数据集分解。此外,多标准决策(MCDM)方案用于优先考虑满足业务需求的失败。结果显示在不同参数配置下的故障预测准确性。在更广泛的上下文上,可以使用提出的机制具有人工生成的数据集执行任何分类问题,而无需查看实际数据,只要输入功能可以转换为二进制值(例如,来自私有二进制分类器的输出)并可以提供分类 - 服务。
translated by 谷歌翻译